Patentscope is the WIPO public access database. It includes coverage of the Patent Cooperation Treaty applications (administered by WIPO) and a wide range of other countries including the European Patent Office, USPTO and Japan totalling 45 million patent documents including 2.7 million PCT applications.

In this article we cover the basics of using Patentscope to search for and download up to 10,000 records. A detailed User’s Guide provides more details on specific features. When compared with other free services Patentscope has the following main strengths.

  1. Full text search in the description and claims of PCT applications on the day of publication and patent applications from a range of other countries including the United States, Japan, China and the European Patent Office among others.
  2. Download up to 10,000 records
  3. Expand search terms into multiple other languages using Cross Lingual Expansion or CLIR
  4. Simple, Advanced and Combined Field searching
  5. Accessible in multiple languages and a WIPO Translate text function
  6. Mobile version and https: access
  7. Sequence listing downloads
  8. Green technologies through the IPC Green Inventory
  9. Different types of graphical analysis on results lists on the fly using the Options menu.

Two detailed guides are also available for using Patentscope and a series of videos have also been released:

  1. Patentscope Search: The User’s Guide.
  2. Patentscope CLIR for the Cross-Lingual Information Retrieval Tool here.
  3. Patentscope video tutorials

The best approach to obtaining patent data from Patentscope is to register for a free account. If not you will not be able to download the data, or gain access to the sequence download area. To register for a free account go here.

Results

When we arrive at the results page we can see that we have 24,614 results with our query displaying as searching AllTXT and all languages. We then have an RSS button to copy the feed over to an RSS feeder for updates.

_config.yml

There is also a query tree button that displays results by language and terms in the relevant sections of the document. We can see an example of this for a more complex query below.

_config.yml

A video tutorial is also available for the Search result list

Downloading the Results

The two excel icons at the end of the menu allow a user to download either the short list (first icon) or the second list as a .xls file. To see these icons you must be logged in with a user account or they will not display.

_config.yml

When we download these results we will receive an .xls sheet with up to 10,000 entries with a couple of header rows that show the query. Note that each record in the Excel sheet is hyperlinked to the corresponding record in Patentscope.

_config.yml

We will go into the use of this data, including with Tableau Public and other tools, in some depth and there are a few things to note here. The first is that the hyperlinked publication number does not possess a kind code (A1, B1 etc.). This only matters in the sense that the number will retrieve multiple documents in other databases linked to the Patentscope number. A second point to note is that Patentscope data is raw in the sense that it is data as it comes from the data providers and is not processed. That means that there can be encoding issues that we will come back to later on in the discussions on data cleaning.

What is very useful about Patentscope is that we can actually obtain quite a significant volume of data on a topic of interest. While this article has simply downloaded the first 10,000 results, to obtain the full result set it would be easy enough to limit the data by year and download the data as a series of sets that can be combined later (e.g. three sets).

Cross Lingual Searching

One challenge in patent searching is the use of different expressions in different languages for the same query. Patentscope presents a very useful solution to this through cross-lingual searching. From the pull down menu select Cross Lingual Expansion, then enter the search terms and press go. The tool will now generate search terms in multiple languages.

_config.yml

To go further with this tool use either the slider settings (precision vs. recall). For example, if we were to insert the search term “synthetic biology” and move recall to the top level (4), we would generate the following query.

“FP:((EN_TI:(”synthetic biology" OR “biologic synthetic”) OR EN_AB:(“synthetic biology” OR “biologic synthetic”)) OR (DE_TI:(“synthetische Biologie” OR “synthetischen biologischen” OR “biologische synthetische” OR “Biologische synthetische”) OR DE_AB:(“synthetische Biologie” OR “synthetischen biologischen” OR “biologische synthetische” OR “Biologische synthetische”)) OR (ES_TI:(“biológicas sintéticas”) OR ES_AB:(“biológicas sintéticas”)) OR (FR_TI:(“biologie synthétique” OR “biologie synthéthique”) OR FR_AB:(“biologie synthétique” OR “biologie synthéthique”)) OR (JA_TI:(“生物合成” OR “合成生体” OR “の生物学的合成”) OR JA_AB:(“生物合成” OR “合成生体” OR “の生物学的合成”)) OR (ZH_TI:(“合成生物”) OR ZH_AB:(“合成生物”)))"

If supervised mode is selected from the Expansion mode drop down, it becomes possible to select technology areas for the generation of terminology. While we haven’t worked through this in detail that could be very helpful for domain specific query generation. All in all, this is one of the most original and powerful tools that Patentscope has to offer. A detailed .pdfguide to using CLIR is available here.

Sequence Data

A third major feature of Patentscope is access to DNA and amino acid sequence listings filed with PCT Applications. This data can be accessed and downloaded for individual records here.

_config.yml

A sample record from the lists can be seen below as a plain text file. Note that some issues may arise with reconciling the plain text file with the WIPO publication number (WO etc.) and this merits careful attention if using this data.

_config.yml

Registered account holders can also use the ftp anonymous download service from the same page. This provides access to the sequence data by year as can be seen below.

_config.yml

If using the anonymous ftp service note that the recent data is measured in gigabytes, so not for download over a weak WIFI connection or a gated connection. Nevertheless, the open accessibility of this data is important. For other sequence data sources you may be interested in the European Bioinformatics Institute resources here and for the US by document number here and until March 2015 at the DNA Patent Database here. Also important is the Patseq tool here.

Round Up

WIPO Patentscope is a powerful tool for gaining access to a significant amount of patent data on a topic of interest. The ability to download 10,000 records cannot be beaten by other free tools. The Cross Lingual Searching tool appears to be unique and valuable. Free access to bulk download of sequence data is likely to keep bioinformaticians happy for quite a long time.

One way of thinking about the role of Patentscope in patent analytics is as a resource that can be combined with other data tools. For example, if we wanted to obtain the abstracts, descriptions or claims of PCT documents in Patentscope then we might use the Patenscope numbers to retrieve data from EPO Open Patent Services or Google Patents using R or Python or other tools. That is, in this case Patentscope overcomes the limitations of search results from other tools but allows for the targeted use of other tools to retrieve more information. The Cross Lingual Searching tool could also be particularly useful for trying to identify and later acquire patent documents from other jurisdictions where a company or organisation may be seeking to operate or to expand patent landscape analysis into jurisdictions with non-Roman alphabets.

The main difficulties that arise from using Patentscope can stem from occasional noise in the data. Patenscope does not clean the data provided from the individual collections with the exception of checking for typological errors in priority numbers and IPC codes. In addition, all text is transformed into UTF-8. However, as is common when dealing with diverse data sources, the results are not always perfect. In addition, because Patentscope data is drawn from a wide range of languages users may need to update their font libraries if large numbers of unusual characters appear in the data (such as installing the Asian language pack for Windows). In practice, as is common with most patent data sources, this can mean significant time is required to clean up the data. Having said this, no other free database tool allows us to download as much data in table form for analysis. As we will see, it is possible to do a lot with Patentscope data.

Learn More